View AN1384_1027759.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

1/28 october 2001 1 - introduction this application note describes an acoustic echo canceller (16 bits, 8khz, 256ms) implementation on the st120 dsp. the first chapter presents the general principle of an aec based on the nlms (normalized least-mean-squared) filtering. the second, third, and fourth chapters detail the main parts of an aec: nlms filtering, power estimation, and speech detection. the sixth and seventh chapters provide results concerning c floating point and c fixed point implementation. the last chapter describes the assembly implementation of the nlms function and gives results in terms of mcps (mega cycle per second - of speech) and code size. AN1384 application note implementing an acoustic echo canceller algorithm using the st120dsp by aude andolfatto
AN1384 - application note 2/28 1 introduction ........................................................................................................ 1 2 aec algorithm ..................................................................................................... 4 2.1 general purpose ................................................................................................ 4 2.2 nlms algorithm ................................................................................................... 5 2.3 restrictions ......................................................................................................... 5 3 nlms algorithm ................................................................................................... 6 4 signal power estimation ................................................................................ 7 5 speech detection ............................................................................................... 8 5.1 introduction ........................................................................................................ 8 5.2 far-end speech detection .............................................................................. 8 5.3 double talk detection ..................................................................................... 8 5.4 near-end speech detection ........................................................................... 9 5.5 hangover counters .......................................................................................... 9 5.6 speech detection program flow ................................................................ 10 6 measurements ..................................................................................................... 11 6.1 algorithm efficiency ........................................................................................ 11 6.1.1 input test files ......................................................................................................... .. 11 6.1.2 performances assessment ......................................................................................... 11 6.2 tools description .............................................................................................. 11 6.3 benchmark parameters ................................................................................... 11 6.3.1 number of (mega) cycles per second of speech (mcps) .......................................... 11 6.3.2 code size ................................................................................................................ ... 11 7 c floating point implementation ................................................................ 12 7.1 role ....................................................................................................................... ... 12 7.2 program flow ...................................................................................................... 12 7.3 convergence properties ............................................................................... 12 7.3.1 tests in quiet environment ........................................................................................ 12 7.3.2 tests during double-talk conditions .......................................................................... 13 7.4 bench results ..................................................................................................... 14 8 c fixed point implementation ........................................................................ 15 8.1 program flow ...................................................................................................... 15 8.2 c prototypes ....................................................................................................... 16 8.3 convergence properties ............................................................................... 17 table of contents page
AN1384 - application note 3/28 8.3.1 tests in quiet environment ........................................................................................ 17 8.3.2 tests during double-talk conditions .......................................................................... 17 8.4 bench results ..................................................................................................... 18 8.4.1 front end ................................................................................................................ .... 18 8.4.2 nlms filtering ........................................................................................................... .18 9 improving performances ............................................................................... 20 9.1 first step ............................................................................................................... 20 9.2 second step .......................................................................................................... 21 10 conclusion ........................................................................................................... 26 11 annex ....................................................................................................................... 27 11.1 asm generated by ghs+lao ............................................................................. 27
AN1384 - application note 4/28 2 - aec algorithm 2.1 - general purpose the coupling between a loudspeaker and a microphone generates some specific problems. the loudspeaker signal is echoed back to the microphone and transmitted back to its origin. as a result, the far-end participant perceives this as an echo. the longer the transmission delay is, the more disturbing the acoustic echo is. to eliminate the acoustic feedback, an echo canceller is introduced in the loudspeaker. by simulating the acoustic echo path, the echo cancellation filter synthesizes a replica of the echo signal, which is subtracted from the signal on the return path. figure 1 : echo canceller configuration table 1 : echo canceller configuration terminology n sample time x(n) far-end speech u(n) near-end speech r(n) signal received by the microphone, also called "near-end signal" estimated echo e(n) residual echo h(z) echo path h(z) estimated echo path s h(z) u(n) r(n) s h(z) e(n) adaptive algorithm rx x(n) r ? n () r ? n ()
AN1384 - application note 5/28 2.2 - nlms algorithm several adaptive algorithms have been proposed for acoustic echo cancellers. one of them is the nlms algorithm, which is widely used. this algorithm attempts to minimize the expected value of the squared error (residual echo). it starts from the initial (arbitrary) value for the tap weight vector, which is improved with the number of iterations. see the nlms implementation section for more information. 2.3 - restrictions problems with aec are double talk conditions, where both operators are speaking at the same time. if not detected, dt(double talk) can cause divergence of the adaptive algorithm. moreover, when the far-end operator is silent and the near-end operator is talking, the filter must not be adapted because the near-end operator is no longer an echo. the following table summarizes the aec principle: table 2 : aec conditions and actions conditions actions far-end speech alone nlms algorithm: filtering and update double talk filtering but no update near-end speech residual echo = near-end signal
AN1384 - application note 6/28 3 - nlms algorithm the following steps constitute the nlms algorithm: C adaptive filter output: C estimation error: C tap weight filter: where is nlms constant at given sample time n and where m is the step size. and p(n) is the estimated power of far-end speech at sample time n. the short window power estimate of far-end speech (fes_short_pwr) is used to normalize the step size. the generic equation for estimating the average power is: , where r constant. nlms not only performs fir filtering, but also updates the filter coefficients. r ? k () wk () xk () = ek () rk () r ? k () C = w k n 1 + () w k n () m en () pn () --------------- xn () + = m en () pn () --------------- pn () 1 r C () pn 1 C ()r x 2 n () + =
AN1384 - application note 7/28 4 - signal power estimation the signal power estimation is used in aec to normalize the loop gain (step size). in addition the power estimation outputs are used by the speech detection function to determine the sequence of operations to be performed in that function. following equation uses the input squared to calculate the signal power. where for a very short window power estimate, 4ms for a short window power estimate, 16ms for a long window power estimate, 2048ms a different value is chosen for each different window-size power estimate. the far-end signal power (fes_short_pwr) is estimated by using a short window size of 16ms. this estimate is used in the nlms algorithm to normalize the step size. a window of 4 ms is used to determine the very short power estimates of near-end power (nes_vshort_pwr) and far-end power (fes_vshort_pwr). these estimates are used in far-end and near-end speech detection. it is important that the functionality of the speech detectors be accurate to avoid erroneous detection, which could lead to an unstable system. pn () 1 a C () pn 1 C () a x 2 n () + = a 1 32 ------ = a 1 128 --------- - = a 1 16384 ---------------- =
AN1384 - application note 8/28 5 - speech detection 5.1 - introduction speech detection is a very important part of aec. it must be done before the software can determine whether to filter, update or freeze the adaptive filter. there are three speech detectors: C far-end speech detector C double-talk detector C near-end speech detector the speech detection software always checks for the presence of the far-end speech first, then it goes to double-talk detection. it performs double-talk detection even if it does not detect far-end speech. this avoids false detection due to small signal level of far-end speech. if the software does not detect either far-end speech or double-talk, it goes to near-end speech detection. all detection is based on the signal power estimate algorithm, which is discussed in detail in signal power estimation section of this report. 5.2 - far-end speech detection far-end speech means that only the far-end speaker is active. this is the only time the aec program performs both filtering and updating. the very short power estimates of the far-end and near-end speech signals are used to determine if far-end speech is present. so, far-end speech is detected only if . where fesmargin is a threshold constant. the value of the threshold constant must be chosen carefully from real-time experiments. if the threshold value is too small, the background noise picked up by the microphone results in a false detection. on the other hand, if the threshold value is too large, part of the speech is not transmitted when the speech signal levels are low. 5.3 - double talk detection in aec algorithms, the presence of both far-end speech and near-end speech is known as double-talk . the following equation implements and defines a double-talk detector based on echo return loss enhancement (erle): where erle equals to 8 db (chosen from real-time experiments), p u (n ) is the short window power estimate of the near-end signal and p e (n) is the short window power estimate of the residual error signal. so, double-talk is detected if where c = 10 ((erle)/10) and d is a threshold constant, determined from real-time experiments. the higher the value of the d constant, the less double-talk is detected, but the more the coefficients will be updated. in addition, a higher threshold constant results in a greater difference between the echo and the echo replica which means less echo cancellation, but it also results in less noise interference. after double-talk is detected, the program freezes the fir filters coefficients updates; however, filtering is still done, and the double talk hangover counter is reset to high (see the hangover counters section for more information). fesvshortpwr fesmargin + nesvshortpwr > erle 10 p 2 x n () p 2 e n () ------------------ ? ? ?? log =
AN1384 - application note 9/28 5.4 - near-end speech detection near-end speech exists when there is no far-end speech and no double-talk . near-end speech is detected by calculating the very short power estimate and the very long power estimate of the near-end signal as follows . where nesmargin is a threshold constant chosen from real-time testing. if near-end speech is detected, the program sets the near-end speech mode bit and freezes the lms adaptive filter function. indeed, when the far-end operator is silent and the near-end operator is talking, the filter must not be adapted because the near-end operator is no longer an echo. 5.5 - hangover counters two hangover counters are used in the speech detection algorithm: C dt_hang C nes_hang each hangover counter is set to a hangover time of 400 samples or 50 ms after its corresponding type of speech is detected (assuming that the sampling frequency is 8 khz). if a type of speech is not detected, its hangover counter is decreased by one. table 3 shows how the counters determine when to do filtering, updating, filtering and updating, or nothing. hangover counters play a very important role in aec algorithms. after each different speech is detected, its corresponding mode bit is set. for example, the far_ speech mode bit sets to 1 to indicate that only far-end speech is detected. the aec_update mode bit is not set to 1 until the double-talk and near-end hangover counters are both less than zero. this method avoids erroneous detection and gives some buffer time to turn on the adaptive filter. table 3 : hangover counters and lms mode bit settings hangover counters set filtering mode bit set updating mode bit dt_hang>=0 nes_hang>=0 yes no nes_hang<0 yes no dt_hang<0 nes_hang>=0 no no nes_hang < 0 yes yes nesvshortpwr nesmargin + neslongpwr >
AN1384 - application note 10/28 5.6 - speech detection program flow the following flowchart shows the speech detection program flow. figure 2 : speech detection flowchart double_talk=1 near_speech=0 far_speech=0 nes_hang=600 dt_hang=600 far_speech=0 double_talk=0 dt_hang-- near speech ? near_speech=1 far_speech=0 double_talk=0 nes_hang=600 yes double talk ? far speech ? no no far_speech=1 double talk ? controllms() yes yes double_talk=0 near_speech=0 nes_hang-- dt_hang-- controllms() no controllms() double_talk=1 far_speech=0 near_speech=0 nes_hang=600 dt_hang=600 yes controllms() near_speech=0 nes_hang-- no dt_hang>=0 ? aec_filtering=1 yes aec_update=0 aec_filtering=0 no data from power estimate
AN1384 - application note 11/28 6 - measurements 6.1 - algorithm efficiency 6.1.1 - input test files input test files consist of: o. a reference wave file (far-end speech) and its associated near-end signal (addition of near-end speech and echo) in which the near-end speaker is silent so that near-end signal is only containing echo. o. the same reference file and a new associated near-end signal in which, this time, near-end speaker is not remaining silent. in this file, double-talk is present. 6.1.2 - performances assessment the assessment is quite similar to one recommended by the etsi. the terminal coupling loss (tcl) is computed as follows: it computes the total attenuation between the receiving port and the sending port of the far-end side. 6.2 - tools description for each compilation the green hills compiler used is the one present in ghs-st100-2.1-01 multi 2000 environment . this compiler provides options that enable to improve performance and code size. the "speed optimization" (-olami) and the "disable loop unrolling" (-onounroll) options are used. using lao (linear assembly optimizer [1] ) still improves performance: it is able to exploit instruction-level parallelism and automatically manages operator expansion, register allocation and loop optimization. lao is used as follows: the assembly file generated during the ghs compilation (with options) is recalled as file.lai (input file for lao). then the lao is used to improve this file. a new "optimized" assembly file is generated and can be used for ghs compilation. option used is : -osliw -ounroll 6.3 - benchmark parameters 6.3.1 - number of (mega) cycles per second of speech (mcps) to measure performances, breakpoints were set in the assembly before and after the call of the functions. in the debugger window, the "cycle" command is then called at each breakpoint. the difference of both return values gives the number of cycles to perform the function (call and rts included). then this number just has to be multiplied by 8000 (sampling rate equals to 8 khz). the result is the number of mcps. 6.3.2 - code size the code size indicated in the further tables is the size given by the "gfunsize" command. this command (with as argument the object file) returns the number of bytes for each function. tcl 10 x 2 n () n ? e 2 n () n ? ----------------------- ? ? ? ? ? ? ? ?? log =
AN1384 - application note 12/28 7 - c floating point implementation 7.1 - role regarding the dynamic precision range of floating variables, the aim of such an implementation is to obtain a reference system. all further fixed point (asm, lai, c ...) implementations should have the same properties (convergence, same output file for the same input files...). 7.2 - program flow the following figure presents the process developed. as described in the previous chapters, the powerestimate function is used to estimate powers with different window lengths. these powers are used first as parameters for the speechdetector function and then to update the step size (updateerf function). the speechdetection function determines what kind of signal the system is receiving (far-end signal, near-end speech, double-talk...) and if it has to filter and/or update. 7.3 - convergence properties note: following results have been determined with constants and threshold fixed to give the bests results with these files. it is obvious that testing the system on a wide range of test files is needed to validate completely the values of constants and threshold. 7.3.1 - tests in quiet environment as specified by the etsi, the measurements were taken after 10 seconds of speech to allow algorithm to reach a steady state. the result is 28.45 db. the initial convergence time is good. the tcl almost reaches its maximum value after two seconds. figure 3 : program flow powerestimate filtering and/or update or nothing updateerf compute output speechdetector
AN1384 - application note 13/28 the following figure represents the tcl( in db) versus the number of samples. 7.3.2 - tests during double-talk conditions the file used for the test was composed of a single talk period of 4 seconds, followed by a succession of double talk and near-end speech for 3 seconds, then the single talk condition returned. when double-talk occurred, the tcl computation was initialized. following figure shows that the filter is no more adapted at all to the signal during double-talk period. an excellent recovery time is realized after double-talk. following double-talk, the tcl resumes its average value within 2-3 seconds. figure 4 : tcl( in db) versus the number of samples in quiet environment figure 5 : tcl( in db) versus the number of samples in double-talk conditions tcl in db number of samples tcl in db number of samples tests during double talk conditions
AN1384 - application note 14/28 7.4 - bench results in this paragraph the best results are presented. ghs options are -olami -onounroll and lao options are -ounroll=1 .a library of floating point intrinsics was used (fast float addition, fast float mac, fast float multiplication and division). it is important to note that the call and the rts instructions are included in all mcps results. however, the number of mcps is sufficient to hide the cycles due to the call/rts instructions impact. table 4 : bench results function name nb of cycles/new sample code size initoutpower 458 236 powerestimate 2163 376 speechdetector 1133 976 controllms 144 nlmsfiltering 40258 176 calcupdate 393 88
AN1384 - application note 15/28 8 - c fixed point implementation 8.1 - program flow the following flowchart describes the aec process. the difference with the floating point version is the test of overflow. it is important to check overflows as they could make the algorithm diverge, if not treated. figure 6 : aec flowchart powerestimate updateerf speechdetector adaptive filtering ? nlms update/filter overflow ? output = 0 compute residual echo no yes no yes
AN1384 - application note 16/28 8.2 - c prototypes table 5 through table 8 show the protoypes used for the lms adaptive filter function, the power estimate function, the speech detection function and the calc erf function. table 5 : prototype for aec nlms adaptive filter function aec nlms adaptive filter function prototype syntax void nlms_filtering(short *input, short erf, short out) parameters *input : table containing last 255 input samples + new input sample erf: normalized error times stepsize out: output sample description this functions is called to adapt the filter coefficients and calculate an fir filter output by using samples input and previous error table 6 : prototype for aec power estimate function aec power estimate function prototype syntax void powerestimate(short fe, short ne, short error, int *fepwr, int *nepwr, int errpwr) parameters fe : new far-end signal sample ne: new near-end signal sampleerror: new error signal sample *fepwr : point to far-end powers (short, vshort, long power estimates) *nepwr : point to near-end powers (short, vshort, long power estimates) errpwr : short error power estimate description this function is called to estimate the signal power which is used in speech detection and to normalize the step size. table 7 : prototype for aec speech detection function aec speech detection function prototype syntax void speechdetector(void) parameters none description this function is called to detect far-end speech, near-end speech and double-talk. after each detection the corresponding mode bit is set. table 8 : prototype for aec update erf function aec update erf function prototype syntax void updateerf(int *fepwr, short out, short ne, short erf) parameters *fepwr: point to fe powers: only fes_short_pwr is used out: calculated output sample ne: new near-end sample ref: new normalized error times step size description this function is used to calculate the new erf.l
AN1384 - application note 17/28 8.3 - convergence properties note: constants used first were short integers corresponding to floating point values used in floating point system. results in term of tcl were quite identical than floating point results:tcl after 10 seconds for single talk was 28,42 db and recover time after double-talk was 2-3 seconds. constants and thresholds have been modified to find bests results. the following results are obtained with those new values. 8.3.1 - tests in quiet environment as specified by the etsi, the measurements were taken after 10 seconds of speech to allow algorithm to reach a steady state. the result is 34.48 db. the initial convergence time is also good. indeed, the tcl nearly attains its maximum value after two seconds, like the floating point system result. following figure represents the tcl in db versus the number of samples. 8.3.2 - tests during double-talk conditions the file used for the test was composed of a single talk period of 3 seconds, followed by a succession of double talk and near-end speech for 1 seconds, then the single talk condition returned for a period of 2 seconds. double-talk then occurred during 1 second and single-talk then returned. when double-talk occurred, the tcl computation was initialized. following figure shows that the filter is no more adapted at all to the signal during double-talk period. an excellent recovery time is realized after double-talk. following double-talk, the tcl resumes its average value within 2-3 seconds. figure 7 : tcl in db versus the number of samples in quiet environment figure 8 : tcl in db versus the number of cycles in double-talk conditions
AN1384 - application note 18/28 8.4 - bench results in this paragraph the best results are presented. they are obtained with ghs-st100-2.1-01 multi 2000 environment and lao . ghs options are -olami -onounroll, as specified in chapter 6. 8.4.1 - front end this module contains all functions excepted filtering: powerestimate, speechdetection, updateerf ... lao options are -ounroll=1. it is important to remind that the call and the rts instructions are included in all mcps results. since the number of cycle for each function is quite short, it is more appropriate to use the traceviewer utility to have extremely right results. the following results (in term of mcps) are obtained by measuring the number of cycles from the decode step of the first instruction to the decode step of the poprts for each function. total number of cycles for front-end module is 27 + 145 + 118 = 290 cycles/sample. it means that the maximum computational-time cost is 290* 8000 = 2.3 mcps. 8.4.2 - nlms filtering lao options are: C lao1: -ounroll=2 C lao2: -ounroll=2 -osliw code size of the function is obtained with the "gfunsize" utility. before explaining how to obtain the number of cycles, the filtering function behavior has to be reminded. after loading new data into data buffer, the function performs a multiplication and accumulation loop that updates the coefficients and shifts the entire data buffer down by one. it calculates each new coefficient value by multiplying the old data times erf. next, it overwrites the old coefficient value with the new value. the updated coefficients then are used to calculate the filter output. table 9 : front end bench results function name nb of cycles/new sample code size (bytes) powerestimate 27 164 speechdetector 145 736 controllms 124 calcupdate 118 104 table 10 : number of cycle of nlms filtering ghs+lao1 ghs+lao2 nb of cycle 1795 1286 table 11 : code size of nlms filtering ghs+lao1 ghs+lao2 code size 216 332
AN1384 - application note 19/28 the c code is presented below: the main part is the loop of win_len (=256, 32ms at a frequency of 8khz) iterations. thats why mcps-results are obtained with the command "cycle", called in the generated assembly before the beginning of the loop and at the end of the loop. the difference between both results gives the number of cycle to perform the entire loop (with 2-unrolling, it means 64 iterations). consequently the numbers presented do not include neither call and rts instructions nor loop initialization. best case is 1286 * 8000 = 10.3 mcps. asm generated is presented p. 20. void f_nlmsfiltering( short *pswork, short *s_erf, short *psoutsig ) { int temp; int iind; int tempo;; for ( temp = 0, iind = 0; iind < win_len; iind ++ ) { tempo = __mpfrch(pswork[iind + 1], (*s_erf)); sfilter[iind] = sfilter[iind] + (short)tempo; temp = __mafcw ( temp, sfilter[iind],pswork[iind]); } *psoutsig = (short)(temp >> 16); }
AN1384 - application note 20/28 9 - improving performances 9.1 - first step ghs and lao do not take into account packed arithmetic [2] that could be largely used in the nlms filtering module. it is necessary to re-write this function in assembly. to take advantage of the sliw functionality of the st120, an iteration of the nlmsfiltering loop processes 2 samples of the data buffer (instead of one). as a result, the operations described in paragraph 8.4 fit perfectly into 3 sliw groupings, which should lead to 3 cycles for two input data samples or 3/2 = 1.5 cycles for one input data sample. the nlmsfiltering function should theoretically be performed in 1.5*n cycles , n being the number of filter taps. the nlmsfiltering function written in assembly [4] is presented below. .text f_nlmsfiltering: .align 8 makec lc0 , 128 setuls0 loop_start makea pr , %abs16to31(sfilter) setle0 loop_end-16 addha p0 , p0 , 2 morea pr , %abs0to15(sfilter) make r0 , 0 make r15 , 0 ldh r14 , @( p1 + 0 ) .align 8 .presliw sliwmd loop_start: ldf r12 , @( pr + 0 ) ldw r1 , @( p0 !+ 4 ) mafrcll r13, r12, r1 , r14 nop ldh r2 , @( p0 - 6 ) ldf sr, @( pr + 2 ) mafrchl sr, sr, r1 , r14 nop mafchl r0 , r0 , r13 , r2 mafchl r15, r15, sr, r1 sdf @(pr !+ 2), r13 sdf @(pr !+ 2), sr loop_end: nop nop nop gp32md addcw r0 , r0 , r15 subha p0 , p0 , 2 sdf @( p2 + 0 ) , r0 rts .leave f_nlmsfiltering
AN1384 - application note 21/28 in order to check our 1.5-cycle loop, a breakpoint is set after the beginning of the loop, and another after the end of the loop. the command "cycle" is called after each breakpoint and the difference of both results gives us the number of cycles needed to perform the whole loop. for 32 ms filter length (number of taps equals to n = 256), the difference is 521 cycles for 128 iterations (n/2). it means that an iteration needs in reality 4 cycles (521/128 = 4,03) so that the processing for one input data sample takes 2 cycles. it makes the nlmsfiltering function needs 2 * n cycles . filtering function takes 535*8000 = 4. 1 mcps. 9.2 - second step in this paragraph, a new way of writing the function is developed in order to get a real 1.5n cycle loop. lets remind that the output result is the addition of intermediate terms sfilter[i]*pswork[i]. the output is computed in the same time than the sfilter[i]. the first iteration of the loop is the following (temp is initialized to zero): # wait for the sfilter[0] # wait for the sfilter[1] to avoid latencies due to the two cycles needed for mac operations (cf #), the process is desynchronized. it means that sfilter[0] and sfilter[1] are computed outside the loop, temp is set to sfilter[0]*pswork[0] and the first iteration of the loop performs the following operations: when sfilter[2] is computed, the second intermediate term of output is computed. when sfilter[3] is computed, the third intermediate term of output is computed, etc. with such a system, sfilter[i] to be used during an iteration is ready when expected. initialisations first iteration of the loop # sfilter[1] is ready # sfilter[2] is ready at c level, the process is presented on pages 23 and 24 table 12 : nlmsfiltering function performance function nb cycles/sample code size nlmsfiltering 535 192 sfilter 0 [] sfilter 0 [] pswork 1 [] serf + = t emp temp sfilter 0 [] pswork 0 [] + = sfilter 1 [] sfilter 1 [] pswork 2 [] serf + = t emp temp sfilter 1 [] pswork 1 [] + = sfilter 0 [] sfilter 0 [] pswork 1 [] serf + = sfilter 1 [] sfilter 1 [] pswork 2 [] serf + = t emp temp sfilter 0 [] pswork 0 [] + = sfilter 2 [] sfilter 2 [] pswork 3 [] serf + = t emp temp sfilter 1 [] pswork 1 [] + = sfilter 3 [] sfilter 3 [] pswork 4 [] serf + = t emp temp sfilter 2 [] pswork 2 [] + =
AN1384 - application note 22/28 like in the paragraph 9.1, an iteration of the loop fits perfectly into 3 sliw bundles but without any latencies, which should lead to a real 1.5n cycle loop. this time, the difference between the number of cycles returned by the "cycle" command called at the beginning and at the end of the loop equals to 390. it means that an iteration of the loop is performed in 390/128 = 3.04 cycles. then the processing for one input data sample really takes 1.5n. filtering function is performed in 402*8000 = 3. 2 mcps. table 13 : nimsfiltering function performance function nb cycles/sample code size nlmsfiltering 402 198 void f_nlmsfiltering( short *pswork, short *s_erf, short *psoutsig ) { inttemp; intiind; inttempo; short adaptrate; short *ptr = null; short filtprec1 , filtprec2; short ptrprec1 , ptrprec2; ptr = &pswork[1]; adaptrate = *s_erf; tempo = __mpfrch(pswork[1], (*s_erf)); sfilter[0] = sfilter[0] + (short)tempo; tempo = __mpfrch(pswork[2], (*s_erf)); sfilter[1] = sfilter[1] + (short)tempo; temp = __mpfcw ( sfilter[0] , pswork[0]); filtprec1 = sfilter[1]; filtprec2 = sfilter[2]; ptrprec1 = ptr[0]; ptrprec2 = ptr[1];
AN1384 - application note 23/28 for ( iind = 2; iind < (win_len); iind =iind +2 ) { tempo = __mpfrch(ptr[iind], adaptrate); sfilter[iind] = filtprec2 + (short)tempo; temp = __mafcw ( temp, filtprec1, ptrprec1); tempo = __mpfrch(ptr[iind + 1], adaptrate); sfilter[iind+1] = sfilter[iind+1] + (short)tempo; temp = __mafcw ( temp, sfilter[iind], ptrprec2 ); filtprec1 = sfilter[iind+1]; filtprec2 = sfilter[iind+2]; ptrprec1 = ptr[iind]; ptrprec2 = ptr[iind+1]; } temp = __mafcw ( temp, sfilter[win_len-1], pswork[win_len-1]); *psoutsig = temp >> 16; }
AN1384 - application note 24/28 the assembly code is presented below. f_nlmsfiltering: push r4-r11 .align 8 makea pr , %abs16to31(sfilter) ldf r5 , @( p1 + 0) makec lc0 , 127 ldf r6 , @( p0 + 2 ) morea pr , %abs0to15(sfilter) ldf r7 , @( pr + 0 ) ldf r8 , @( pr + 2 ) mafrchh r7 , r7 , r6 , r5 sdf @( pr + 0 ) , r7 ldf r6 , @( p0 + 4 ) addha p1 , p0 , 6 setuls0 start_loop mafrchh r6 , r8 , r6 , r5 setle0 end_loop-16 sdf @( pr + 2 ) , r6 ldf r9 , @( pr + 4 ) ldf r10 , @( p0 + 0) addwa pr , pr , 4 ldf r11 , @( p0 + 2 ) mpfchh r2 , r7 , r10 ldf r8 , @( p0 + 4 ) movehl r11 , r8
AN1384 - application note 25/28 assembly code (continued) .align 16 .presliw sliwmd start_loop: ldhsw r4 , @( p1 !+ 4 ) nop nop mafrchh r1 , r9 , r4 , r5 ldf r9 , @( pr + 4 ) ldf r10 , @( pr + 2 ) mafchh r2 , r2 , r6 , r11 mafrclh r6 , r10 , r4 , r5 mafchl r2 , r2 , r1 , r11 extw r11 , r4 sdf @( pr + 2 ) , r6 sdf @( pr !+ 4 ) , r1 end_loop: ldh r0 , @( p0 + 510 ) ldh sr , @( pr - 2 ) mafcll r1 , r2 , sr , r0 nop nop nop sdf @( p2 + 0 ) , r1 gp32md poprts r4-r11 .leave f_nlmsfiltering
AN1384 - application note 26/28 10 - conclusion during single-talk period, the computational-time cost is 2.32 + 3.2 = 5.6 mcps (front-end + nlms filtering); during double-talk periods, it is reduced to 2.32 mcps (front-end). single talk periods represent the main part of a conversation, whereas double-talk periods are present only during brief and limited instances.
AN1384 - application note 27/28 11 - annex 11.1 - asm generated by ghs+lao nlmsfiltering function f_nlmsfiltering: .align 8 .lao2: g15? makec lc0 , 128 g15? makea pr , %abs16to31(sfilter) setuls0 .l55 g15? addha p0 , p0 , 2) setle0 .lao5-4 g15? morea pr , %abs0to15(sfilter) g15? make r0 , 0 .align 16 .l55: g15? ldh r2 , @( p1 + 0 ) g15? ldh r1 , @( p0 + 0 ) g15? mafcll r1 , sr , r1 , r2 g15? ldh r2 , @( pr + 0 ) g15? shrw r1 , r1 , 16 g15? addu r1 , r2 , r1 g15? sdh @( pr !+ 4 ) , r1 g15? ldh r2 , @( p0 + 2 ) g15? ldh r12 , @( p1 + 0 ) g15? mafcll r12 , sr , r2 , r12 g15? ldh r2 , @( p0 - 2 ) g15? mafcll r2 , r0 , r1 , r2 g15? ldh r0 , @( pr - 2 ) g15? shrw r1 , r12 , 16 g15? addu r0 , r0 , r1 g15? sdh @( pr - 2 ) , r0 g15? ldh r12 , @( p0 !+ 4 ) g15? mafcll r0 , r2 , r0 , r12 .align 8 .lao5: .align 8 .lao10: .l56: g15? subha p0 , p0 , 2 g15? sdf @( p2 + 0 ) , r0 g15? make r0 , 0 g15? rts .lao1: .leave f_nlmsfiltering
AN1384 - application note information furnished is believed to be accurate and reliable. however, stmicroelectronics assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties which may result f rom its use. no license is granted by implication or otherwise under any patent or patent rights of stmicroelectronics. specificati ons mentioned in this publication are subject to change without notice. this publication supersedes and replaces all information previously supplied. stmicroelectronics products are not authorized for use as critical components in life support devices or systems without express written approval of stmicroelectronics. the st logo is a registered trademark of stmicroelectronics ? 2001 stmicroelectronics - all rights reserved stmicroelectronics group of companies australia - brazil - china - finland - france - germany - hong kong - india - italy - japan - malaysia - malta - morocco singapore - spain - sweden - switzerland - united kingdom - u.s.a. http://www.st.com 28/28 acronyms and definitions C aec : acoustic echo cancellation/canceller C dt : double talk C erle : echo return loss enhancement C etsi : european telecom standard institute C ghs : greenhills C lai : linear assembly input language C lao : linear assembly optimizer C mcps : megacycle per second C nlms : normalized least mean squared C tcl : terminal coupling loss references [1] st120 dsp tool set users guide available on www.st.com/st100/. [2] st120 dsp-mcu core reference guide available on www.st.com/st100/. [3] st120 dsp-mcu instruction set reference guide available on www.st.com/st100/. [4] st120 dsp-mcu programming manual available on www.st.com/st100/.

▲Up To Search▲

Price & Availability of AN1384

	To Download AN1384 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .